A Pilot Detection and Associate Study of Gene Presence-Absence Variation in Holstein Cattle

Simple Summary PAV, or presence-absence variation, means that some individuals have certain genes, while others of the same species do not. This study analyzed PAVs in 173 Holstein bulls using whole-genome sequencing and examined their associations with 46 traits. Out of 28,772 genes, an average of 98.06% were present and 1.94% were absent. A total of 1793 genes were missing in at least one bull, including olfactory receptor (OR) genes, other protein-coding genes, tRNAs, microRNAs, and uncharacterized genes. Core genes (present in all bulls) made up 93.77% of the gene pool, while variable genes included softcore (present in 95–99%), shell (5–94%), and cloud genes (<5%). Cloud genes were linked to hormonal and antimicrobial functions, and shell genes to immune functions. Genetic analysis showed high similarity among the bulls, with few outliers. PAV-based genome-wide association studies (GWAS) found links between PAVs and 15 traits, including milk, fat, and protein yields, health (metritis), and reproduction traits, particularly on chromosomes 15 and 7, involving OR and immune-related genes. This research provides insights into the genetic structures underlying complex traits in Holstein cattle. These findings fill gaps in our understanding and lay the groundwork for integrating PAV into future animal breeding programs as a prediction tool. Abstract Presence-absence variations (PAVs) are important structural variations, wherein a genomic segment containing one or more genes is present in some individuals but absent in others. While PAVs have been extensively studied in plants, research in cattle remains limited. This study identified PAVs in 173 Holstein bulls using whole-genome sequencing data and assessed their associations with 46 economically important traits. Out of 28,772 cattle genes (from the longest transcripts), a total of 26,979 (93.77%) core genes were identified (present in all individuals), while variable genes included 928 softcore (present in 95–99% of individuals), 494 shell (present in 5–94%), and 371 cloud genes (present in <5%). Cloud genes were enriched in functions associated with hormonal and antimicrobial activities, while shell genes were enriched in immune functions. PAV-based genome-wide association studies identified associations between gene PAVs and 16 traits including milk, fat, and protein yields, as well as traits related to health and reproduction. Associations were found on multiple chromosomes, illustrating important associations on cattle chromosomes 7 and 15, involving olfactory receptor and immune-related genes, respectively. By examining the PAVs at the population level, the results of this research provided crucial insights into the genetic structures underlying the complex traits of Holstein cattle.


Introduction
Holstein cows hold significant economic importance in the dairy industry worldwide due to their high efficiency in converting feed into milk.In recent decades, a substantial amount of cattle short-read whole-genome sequence (WGS) data have been generated from different breeds in various geographical locations and environmental conditions [1,2].For example, cattle WGS data have frequently been employed to identify single-nucleotide polymorphisms (SNPs) and insertions/deletions (INDELs) for population genetics and genomewide association studies [3][4][5][6][7].They have also been utilized to investigate CNVs [8][9][10][11][12] and SVs [3,[13][14][15][16].The advent of long-read sequencing technologies has enabled the detection of SVs.However, their utility at the population scale is hindered by their high cost, limited throughput, and requirements for large quantities of DNA [17].
Structural variations (SVs) are genomic alterations larger than 50 bp and include copy number variations (CNVs) and rearrangements (inversions and translocations) [18,19].Presence-absence variations (PAVs) represent a specific form of SV, wherein a genomic segment contains one or more genes that are present in some individuals but absent in others [20].As many PAVs are identified in different species, their occurrence differs across populations and individuals.PAVs are crucial for phenotypic diversity, population adaptation, and evolution studies [21].Therefore, PAVs can help to elucidate certain portions of heritability not captured by SNP-based genome-wide association studies (GWAS) [22].
According to the pangenome definition, the core genome comprises genes present in all individuals, whereas the dispensable genome encompasses unique genes specific to some individuals [21,23].The core and dispensable genomes can vary between closely related species and even among individuals of the same species.The dispensable genome consists of genes with small or rare effects [24], yet they may have significant evolutionary functions and contribute to phenotypic variation [21,25].For instance, in the mussel genome, approximately 38% of the genome was identified as dispensable [26].In chickens, the dispensable genome was ~24% when analyzing 664 individuals from different breeds, showing a moderate core gene content of ~76% [27].In pigs, the dispensable genome was ~17% when analyzing three European commercial breeds and 18 Chinese domestic breeds [28].
Despite extensive research on PAVs in plants [29][30][31][32][33], as well as in other species such as mussels [26,34], chickens [27], and pigs [28], reports on PAVs in cattle are scarce.Moreover, while these data were primarily utilized for SNP and CNV detection, examination of gene PAVs among Holstein dairy cattle has not been reported.The objective of this study was to utilize WGS data from 173 Holstein dairy cattle to identify gene PAVs at the population level.Subsequently, association analyses were conducted to assess their correlation with economically significant traits and to gain insights into their underlying genetic structure.

Phenotypes, dPTA, and Correlation Analysis
Individual trait data for 154 registered Holstein bulls were retrieved from the National Cooperator Database from the U.S. Council of Dairy Cattle Breeding (CDCB).These data were part of the December 2022 genomic evaluations from the CDCB, which routinely calculates predicted transmitting ability (PTA) values for dairy cattle of different breeds.These 46 phenotypes are described in Supplementary Table S1.
De-regressed PTA (dPTA) from all traits were used for PAV-based GWAS and calculated according to the formula: dPTA = PTA/reliability, as previously described [42].Pearson correlations were computed between pairs of dPTA for all 46 phenotypes using R for 154 Holstein bulls.

Gene Presence-Absence Variation Identification
SGSGeneLoss (version 0.1) was used to identify the presence and absence of genes in each sample [43,44].The parameters used were minCov = 5 and lostCutoff = 0.2.A PAV was classified as present if it had more than 20% coverage with at least five reads; otherwise, it was classified as absent.The cattle transcripts (66,384) were filtered to keep only the longest transcript for each of the genes, resulting in 28,772 genes.Following PAV identification, the present genes were classified as follows: core (present in 100% of the accessions), softcore (present in 95-99%), shell (present in 5-94%), and cloud (present in <5%) genes, based on their gene presence frequencies [27,33].

Statistical Overrepresentation Tests of Variable Genes
We conducted a statistical overrepresentation analysis using PANTHER (version 18.0) [45] with the PANTHER Gene Ontology (GO) slim datasets for Biological Process (BP), Molecular Function (MF), Cellular Component (CC), and Reactome.Fisher's exact test, adjusted for the false discovery rate (FDR < 0.05), was employed for this analysis.Specifically, the test was conducted for each variable gene category (softcore, shell, and cloud) to ascertain if there was a significant overrepresentation of any gene category.

Population Genetic Analyses
The identified gene presence-absent variations were further employed in population genetic analyses.We constructed a PAV matrix for the 173 animals and 28,772 genes, followed by principal component analysis (PCA) and neighbor-joining phylogenetic analysis conducted using TASSEL (version 5.0) [46].

Gene PAV-Based GWAS
PAV-based GWAS was conducted with 46 traits (dPTA), including production, body type, reproduction, and health traits from 154 animals (Supplementary Table S1).Prior to analysis, PAVs were filtered based on a Minor Allele Frequency (MAF) > 0.05 and chromosome range (bovine autosomes).The association analysis was performed using TASSEL (version 5.0) [46], employing the general linear model (GLM).A Bonferroni test was applied to establish the genome-wide significance (0.05/number of loci) or suggestive (0.1/number of loci) cut-off thresholds.

Mapping Reads
The sequencing data for samples of 173 Holstein cattle comprised paired-end reads with a maximum length of 151 bp, yielding an average of 617,227,322 raw reads for each sample (Supplementary Table S3).Following the filtering process, an average of 556,623,661 clean reads were obtained, representing an average of 90.18% of the raw data that were used for subsequent analysis (Supplementary Table S3).The clean reads were then aligned to the B. taurus ARS-UCD1.3reference genome assembly [37], resulting in an overall average mapping rate of 99.26% across all 173 samples (Figure 2) This alignment produced an average of 0.20% of singleton reads, 99.07% of paired mapped reads, and 0.74% of unmapped reads (Supplementary Table S3).

Mapping Reads
The sequencing data for samples of 173 Holstein cattle comprised paired-end read with a maximum length of 151 bp, yielding an average of 617,227,322 raw reads for eac sample (Supplementary Table S3).Following the filtering process, an average o 556,623,661 clean reads were obtained, representing an average of 90.18% of the raw dat that were used for subsequent analysis (Supplementary Table S3).The clean reads wer then aligned to the B. taurus ARS-UCD1.3reference genome assembly [37], resulting in a overall average mapping rate of 99.26% across all 173 samples (Figure 2) This alignmen produced an average of 0.20% of singleton reads, 99.07% of paired mapped reads, an 0.74% of unmapped reads (Supplementary Table S3).

Identification of Gene PAVs in Holstein Cattle
Gene PAVs were identified for each sample across bovine autosomes.A total of 28,77 cattle genes (with only the longest transcripts included) were assessed by PAV analysis.A gene was considered present if at least five reads have covered more than 20% of the cu mulative coverage of the exons of each gene, as defined previously [27,33].Otherwise, th gene was considered absent.Based on this definition and considering the 173 animals, o average, 28,214 genes (98.06%) were present in a particular animal, while 558 genes wer absent (1.94%) (Supplementary Table S4).Among the 28,772 cattle genes, 28,465 gene were present in at least one animal, and 26,979 genes were present in all 173 Holstei animals (Supplementary Table S5).The chromosome distribution of the genes present ex hibited a similar pattern when the genes were present in at least one animal vs. present i all 173 animals (Figure 3).BTA3, 7, 19, 5, and 18 harbored a higher number of genes pre sent compared to all other chromosomes.

Identification of Gene PAVs in Holstein Cattle
Gene PAVs were identified for each sample across bovine autosomes.A total of 28,772 cattle genes (with only the longest transcripts included) were assessed by PAV analysis.A gene was considered present if at least five reads have covered more than 20% of the cumulative coverage of the exons of each gene, as defined previously [27,33].Otherwise, the gene was considered absent.Based on this definition and considering the 173 animals, on average, 28,214 genes (98.06%) were present in a particular animal, while 558 genes were absent (1.94%) (Supplementary Table S4).Among the 28,772 cattle genes, 28,465 genes were present in at least one animal, and 26,979 genes were present in all 173 Holstein animals (Supplementary Table S5).The chromosome distribution of the genes present exhibited a similar pattern when the genes were present in at least one animal vs. present in all 173 animals (Figure 3).BTA3, 7, 19, 5, and 18 harbored a higher number of genes present compared to all other chromosomes.KYAT1, TAP2, TUBA3E, and WDR36.The chromosome distribution of absent genes exhibited a similar pattern whether the gene was absent in at least one animal or all 173 animals (Figure 3).A higher number of genes were absent on BTA15, 5, 10, 23, and 3. BTA15 displayed the highest number of genes absent in at least one animal compared to the other chromosomes, with approximately 80% being uncharacterized genes, and 12.6% being olfactory receptor genes (Figure 3B).Additionally, the present genes were classified as core, softcore, shell, and cloud genes present in 100%, 95-99%, 5-94%, and <5% of all animals, respectively.A total of 26,979 (93.77%) core genes are shared by all 173 individuals.Furthermore, 1793 genes are variable (i.e., missing in at least one individual), including 928 softcore, 494 shell, and 371 cloud genes (Figure 4A).Statistical overrepresentation analysis (including GO and Reactome) results revealed the enrichment of variable genes, as shown in Supplementary Table S7.Cloud genes were significantly enriched (FDR < 0.05) in functions associated with hormone and antimicrobial activities, glucocorticoid biosynthesis, regulation of peptidase activity, lipopolysaccharide binding, and antioxidant activity (Figure 4B, Supplementary Table S7).Shell genes were predominantly enriched (FDR < 0.05) with immune system functions (Figure 4B, Supplementary Table S7).Softcore genes showed enrichment (FDR < 0.05) in functions associated with carotenoid and terpenoid biosynthesis, the immune system, regulation of the receptor signaling pathway via JAK-STAT, DAP12, cell communication and signaling, sensory perception, and response to chemicals or stimuli (Figure 4B, Supplementary Table S7).The absent genes in at least one individual animal were merged to create a comprehensive list of all dispensable genes (Supplementary Table S6).A total of 1793 genes were absent in at least one animal, comprising ~69% of uncharacterized genes, ~13% of transfer RNA (tRNAs), ~4% of microRNAs, and ~3% of olfactory receptor genes (Supplementary Table S6).Among these, 51 animals (~30%) exhibited ≥ 2.0% absent genes.Additionally, 307 genes were absent in all 173 animals, including 194 uncharacterized genes, six immune-related genes (IFITM1, IFNB1, IFNL3, IFNT3, CATHL1, and CATHL4), eight olfactory receptor genes (OR10AD1, OR2AG1, OR52J3, OR6C3, etc.), 70 tRNAs genes, one reproduction-related gene (GPX5), and other genes including BMP8A, GML, HBA, HBA1, KYAT1, TAP2, TUBA3E, and WDR36.The chromosome distribution of absent genes exhibited a similar pattern whether the gene was absent in at least one animal or all 173 animals (Figure 3).A higher number of genes were absent on BTA15, 5, 10, 23, and 3. BTA15 displayed the highest number of genes absent in at least one animal compared to the other chromosomes, with approximately 80% being uncharacterized genes, and 12.6% being olfactory receptor genes (Figure 3B).
Additionally, the present genes were classified as core, softcore, shell, and cloud genes present in 100%, 95-99%, 5-94%, and <5% of all animals, respectively.A total of 26,979 (93.77%) core genes are shared by all 173 individuals.Furthermore, 1793 genes are variable (i.e., missing in at least one individual), including 928 softcore, 494 shell, and 371 cloud genes (Figure 4A).Statistical overrepresentation analysis (including GO and Reactome) results revealed the enrichment of variable genes, as shown in Supplementary Table S7.Cloud genes were significantly enriched (FDR < 0.05) in functions associated with hormone and antimicrobial activities, glucocorticoid biosynthesis, regulation of peptidase activity, lipopolysaccharide binding, and antioxidant activity (Figure 4B, Supplementary Table S7).Shell genes were predominantly enriched (FDR < 0.05) with immune system functions (Figure 4B, Supplementary Table S7).Softcore genes showed enrichment (FDR < 0.05) in functions associated with carotenoid and terpenoid biosynthesis, the immune system, regulation of the receptor signaling pathway via JAK-STAT, DAP12, cell communication and signaling, sensory perception, and response to chemicals or stimuli (Figure 4B, Supplementary Table S7).

3.4.PAV Analysis
PCA and neighbor-joining phylogenetic analysis were performed to assess gene diversity based on the identified gene PAVs.Gene PAV-based PCA and phylogenetic analysis revealed high similarity among the 173 Holstein animals (Supplementary Figure S1).Specifically, PCA indicated that seven animals clustered separately from the rest (Supplementary Figure S1A), while the neighbor-joining phylogenetic tree also showed high similarity, with 10 animals exhibiting slightly more divergence (Supplementary Figure S1B).

PAV Analysis
PCA and neighbor-joining phylogenetic analysis were performed to assess gene diversity based on the identified gene PAVs.Gene PAV-based PCA and phylogenetic analysis revealed high similarity among the 173 Holstein animals (Supplementary Figure S1).Specif-ically, PCA indicated that seven animals clustered separately from the rest (Supplementary Figure S1A), while the neighbor-joining phylogenetic tree also showed high similarity, with 10 animals exhibiting slightly more divergence (Supplementary Figure S1B).

Discussion
In this study, we conducted an analysis of sequencing data obtained from 173 Holstein bulls, aiming to identify the PAVs in dairy cattle for the first time.Our analysis revealed gene presence variations across the sampled population.Notably, all animals exhibited a high average mapping rate, with a moderate sequencing coverage of 20×, where 87% of the bulls had a coverage exceeding 15×.A previous study on PAV in chickens utilized a minimal sequencing coverage cut-off of less than 10× [27].Prior research has demonstrated that a sequencing coverage of 10× allows for the recovery of between 98-99.9% of gene PAVs [27,33].
Among the 28,772 cattle genes, the majority were classified as present (>20% of coverage with at least five reads in one animal), with an average of 28,214 present genes (98.06%), while an average of 558 genes were identified as absent.Impressively, 93.77% of these genes, totaling 26,979, were present in all 173 Holstein animals, indicating a substantial proportion of core genes compared to other studies [9,10,34].For example, a study in chickens identified 76% of core genes [27], but unlike our study that analyzed only one cattle breed (Holstein), this chicken study evaluated 664 individuals from five wild subspecies with 28 native breeds and four commercial breeds.Similarly, a study in mussels identified 69% of core gene content from different European populations [26].A study in pigs identified 83.2% of core genes shared by all individuals from three European commercial breeds and 18 Chinese domestic breeds [28].The high levels of core genes observed in our study may be attributed to our focus on a single cattle breed (Holstein), which likely shares a similar genetic background and exhibits high genetic homology.
Despite the predominance of core genes, we identified 1793 variable or dispensable genes (6.24%) across the 173 bulls.While this proportion seems relatively small, of the 28,772 genes, these genes could signify crucial genomic regions contributing to cattle genetic variability and potentially be indicative of environmental adaptation.Comparable findings in humans suggest that about 10% of genes are dispensable [47,48], yet they may play significant roles in genome evolution or phenotypic variation, as demonstrated in previous studies [21,25,49].
The differentiation between core and dispensable genome components is not static, and the genome may undergo alterations due to PAVs or SVs, leading to changes in the proportions of the genome classes and potentially creating a conditionally dispensable genome [21].Although our study revealed a lower content of dispensable genes compared to humans [47,48], chickens [27], mussels [26], or pigs [28], a previous study in cattle reported a similar dispensable genome content when analyzing the whole genome of five cattle assemblies from Angus, Highland, and Original Braunvieh, and their close relative Brahman [50].That study observed that 6.10% of the cattle genome is dispensable (or flexible), albeit without identifying PAVs [50].These findings collectively underscore the importance of understanding both core and dispensable genome components in elucidating genetic diversity and evolutionary dynamics in cattle populations.
In our analysis of 173 Holstein cattle, we identified 1793 variable genes that were not universally present in all individuals.These genes were categorized based on their frequency of occurrence: cloud genes (<5%), shell genes (5-94%), and softcore genes (95-99%).Cloud genes, characterized by their rare occurrence (<5%), were notably enriched in functions related to hormone activity, peptidase activity, regulation of proteolysis, and glucocorticoid biosynthesis.Shell genes, which exhibited intermediate frequency (5-94%), were predominantly associated with immune-related functions.Softcore genes, occurring with relatively higher frequency (95-99%), showed enrichment in various func-tions including immune-related functions, regulation of the JAK-STAT signaling cascade, DAP12 signaling, response to nutrient levels, sensory perception, and others.Importantly, softcore genes showed enrichment in functions related to the innate immune response.Of particular interest is the DAP12 signaling pathway, which is known for its role in innate immunity responses [51].In our study, genes enriched for this pathway included KLRC1, NKG2A, and NRAS.Additionally, the cytokine-activated Janus kinase (JAK)/signal transducer and activator of transcription (STAT) pathway, crucial in modulating immunity and inflammation, has been linked to mastitis resistance and milk production in dairy cattle [52].In our findings, genes associated with this pathway were PRP1 (prolactin-related protein 1), PRP2, and PRP14, suggesting potential implications for immune response and milk production traits.
In addition, softcore genes were identified within functions related to sensory perception (GO:0007600), responses to chemicals (GO:0042221), and the detection of stimulus (GO:0051606).GO BP term pathways were mainly olfactory receptors (ORs) and taste receptor genes.ORs constitute the largest gene family within the mammalian genome, with ~400 functional genes (with intact coding regions) and over 400 pseudogenes (nonfunctional segments) in humans [53][54][55].The composition of OR members vary significantly across species and individuals due to the abundance of pseudogenes, CNVs, deletions, and SNPs [55][56][57].In cattle, 1071 OR genes have been identified, including 881 functional genes, 190 pseudogenes, and 352 partial genes distributed across 26 chromosomes, with BTA15 harboring the highest number of functional genes (251) [58].In our study, a total of 225 OR genes were identified, of which 165 OR are core genes and 62 are variable genes, including 35 softcore, eight cloud, and 19 shell genes.Our analysis revealed that 10 OR genes were absent in all 173 Holstein animals studied.Given that one odor can activate multiple ORs, there certainly exists a degree of redundancy among OR genes.Studies have demonstrated that even subtle functional changes in specific OR genes can profoundly alter odor perception in humans [59][60][61], underscoring the significant impact a single OR may have on odorant perception.
In this study, cattle PAV-based GWAS was performed with 46 phenotypes.Phenotype correlation analysis indicated moderate to strong correlations within each trait group, as described before in cattle [62,63].The gene PAV-based GWAS analysis yielded 18 associations between gene PAVs and traits in dairy cattle.Notably, several OR genes, including OR52N2, OR52E6, LOC785207, and LOC782221, showed associations with several important correlated traits.OR52E6 was linked to MET, HTH, FAT, and EFC, while OR52N2 was associated with PRO, MLK, and MET.LOC785207 (olfactory receptor family 52 subfamily S member 2) was found to be associated with FS, while LOC782221 (olfactory receptor family 8 subfamily B member 1AQ), despite being a pseudogene (being recently updated), showed an association with CT.Although pseudogenes are generally considered non-functional, a human GWAS study revealed associations of 13 pseudogenes with disease susceptibility [64].Despite encoding truncated proteins, pseudogenes can still be transcribed into RNA and may play a role in regulating gene expression [65].Although significant sites may only indicate markers linked to the causative mutation, there is a notable enrichment of pseudogenes and nonfunctional genes within the OR gene repertoire [57].
These associated OR genes belong to the OR52 family, while one is from the OR8 family.Subtypes of the OR52 family in humans are known to recognize carboxylic acids [66], as well as butter-like aromas (butanoic acid, gamma decalactone, and diacetyl) [67].Odor perception is crucial for cattle, influencing their selection of feed, avoidance of dangers, and engagement in reproductive and social behaviors.The high variability observed in the OR gene repertoire in cattle may be attributed to selection pressures and environmental changes [58].In humans, OR genes have been linked to appetite regulation and food intake [68,69].Similarly, in cattle, OR genes have been associated with diverse economic traits such as feed intake [63,70], feed utilization [71], reproduction [72], carcass performance [73], and methane emission [74].This suggests a potential regulation of appetite in cattle by ORs, which could impact feed intake, feeding behavior, body composition, and weight gain [70].Moreover, our gene PAV-GWAS results showed that LOC112442670 (keratin-associated protein 9-7-like) was associated with MFV in dairy cattle.Furthermore, two other immune genes were associated with important traits in dairy cattle.LOC100296997 (T cell receptor alpha variable 14/delta variable 4-like) is associated with LIV, and LOC789175 (beta-defensin 103B-like) is linked to the FLC.β-defensins, prominent antimicrobial peptides, serve as the primary defense against prevalent infections in dairy cattle, including intramammary infections [75].Additionally, β-defensins contribute to regulating epithelial proliferation during the wound-healing process [76].
The LOC112449566 gene (cytochrome P450 11B1, mitochondrial-like) was found to be associated with PL and FUA in dairy cattle.A previous study in dairy cows has identified associations of LOC112449566 with lactose yield in Fleckvieh cattle [77].Additionally, in a recent study of milk production traits, this gene was also associated with fat percentage in Walloon Holstein cows [78].These findings suggest that LOC112449566 is a potential candidate gene in cattle for further investigation into its role in milk production and related traits.
In our study, we also found one adhesion G protein-coupled receptor gene, LOC100337044, located on BTA7, was associated with PL in dairy cattle.G protein-coupled receptors (GPCRs) are cell surface receptors that detect molecules outside the cell, comprising over 750 members in mammals [79].Adhesion G protein-coupled receptors (AGPCRs) constitute a subgroup of GPCRs, consisting of 33 members in humans with diverse expression patterns and functions, which are primarily involved in the regulation of adhesion [80].In humans, the ADGRE3 (adhesion G protein-coupled receptor E3) or EMR3 gene is predominantly expressed by cells of the immune system and plays a role in leukocyte migration [81].A study evaluating differences in the innate immune response in Holstein and Angus cattle found varying levels of methylation in the LOC100337044 gene in fibroblast cultures challenged with E. coli lipopolysaccharide [82].Additionally, another study in cattle identified the LOC100337044 gene associated with protein percentage in Thai dairy cattle [83].These findings underscore the potential significance of AGPCR genes in dairy cattle and warrant further investigation into their roles in various physiological processes and traits.
It is imperative to acknowledge that our ability to detect PAVs was constrained by the limitations of both short-read sequencing and the current linear reference genome assembly.However, advancements such as telomere-to-telomere assembly, bovine pangenome graph construction, and the utilization of long-read sequencing technologies hold promise for uncovering PAVs in previously inaccessible genomic regions.These advancements will undoubtedly enhance our ability to capture a comprehensive spectrum of PAVs in the future.Moreover, conducting functional validation experiments on the identified candidate genes will be essential for future studies to corroborate our findings.

Conclusions
The discovery of PAVs in the Holstein bulls has provided novel insights into the genomic landscape of cattle.Our analysis revealed that nearly 98% of assessed genes were present in the population, while approximately 2% exhibited absence variation, indicating underlying genetic diversity.Core genes, found in all individuals, comprised a substantial portion of all genes; the presence of variable genes such as softcore, shell, and cloud genes suggests genomic adaptability.The enrichment of cloud genes in functions such as hormone activity and antimicrobial peptides implies their role in physiological adaptation, while shell genes enriching immune functions highlight the importance of genetic variability in immune traits and host defense.Despite some genomic variability, the overall genetic similarity suggests homogeneity within the population.Furthermore, in this study, we performed a PAV-based GWAS for 46 important production traits, revealing six PAVs located on BTA7, 14, and 15, which were significantly associated with five traits.These significant associations illustrate the potential impact of candidate genes related to OR, AGPCR, and cytochrome genes on vital production and health traits in Holstein dairy cattle.

Figure 2 .
Figure 2. The sequencing coverage (×) and mapping rate (%) across 173 Holstein animals.Black dots represent the sequencing coverage, and blue line represents the mapping rate.

Figure 2 .
Figure 2. The sequencing coverage (×) and mapping rate (%) across 173 Holstein animals.Black dots represent the sequencing coverage, and blue line represents the mapping rate.

Figure 3 .
Figure 3.The distribution of gene PAVs across bovine autosomes in 173 Holstein cattle.(A) Gene PAVs present in at least one animal.(B) Gene PAVs absent in at least one animal.(C) Gene PAVs present in all 173 animals.(D) Gene PAVs absent in all 173 animals.

Figure 3 .
Figure 3.The distribution of gene PAVs across bovine autosomes in 173 Holstein cattle.(A) Gene PAVs present in at least one animal.(B) Gene PAVs absent in at least one animal.(C) Gene PAVs present in all 173 animals.(D) Gene PAVs absent in all 173 animals.

Figure 4 .
Figure 4. (A) Gene classification in 173 Holstein cattle according to their gene presence frequencies.The present genes were classified as core (present in 100% of the individuals), softcore (present in 95-99%), shell (present in 5-94%), and cloud (present in <5%) genes.(B) The top five Biological Process (BP) GO terms are overrepresented in the variable genes.

Figure 4 .
Figure 4. (A) Gene classification in 173 Holstein cattle according to their gene presence frequencies.The present genes were classified as core (present in 100% of the individuals), softcore (present in 95-99%), shell (present in 5-94%), and cloud (present in <5%) genes.(B) The top five Biological Process (BP) GO terms are overrepresented in the variable genes.

Table 1 .
Summary of the most significant gene PAV effects on 46 dairy traits in Holstein cattle.